PostgreSQL 数据库性能 函数稳定态影响分区表分区剪枝

1 本章背景知识

PostgreSQL 函数有三种稳定态,具体可以看 PostgreSQL 数据库性能 函数稳定性影响函数调用次数

函数稳定态影响SQL 索引使用、分区剪枝。

本文通过展示稳定态对于分区剪枝功能的影响。

2 环境准备

CREATE TABLE t01_part(id INTEGER,part_date DATE,name TEXT)
PARTITION BY range(part_date)
(
  PARTITION part2023 VALUES LESS THAN (to_date('20240101000000','YYYY-MM-DD HH24:MI:SS')),
  PARTITION part2024 VALUES LESS THAN (to_date('20250101000000','YYYY-MM-DD HH24:MI:SS')),
  PARTITION part2025 VALUES LESS THAN (to_date('20260101000000','YYYY-MM-DD HH24:MI:SS')),
  PARTITION part2026 VALUES LESS THAN (to_date('20270101000000','YYYY-MM-DD HH24:MI:SS'))
);
INSERT INTO t01_part SELECT generate_series(1,2000000),now() + random()*1000 ,md5(random());

这里构建了一个以时间为分区列的表。
以下例子通过传入不同时间函数参数,验证分区剪枝情况。

3 函数稳定态与分区剪枝

先来看需要用到的两个函数的稳定态:

SELECT proname,provolatile FROM sys_proc WHERE proname in ('now','clock_timestamp');
//屏幕输出:
     proname     | provolatile
-----------------+-------------
 now             | s
 clock_timestamp | v

3.1 stable 函数可以使用分区剪枝

EXPLAIN SELECT * FROM t01_part WHERE part_date=now();

//屏幕输出:
 QUERY PLAN
-------------------------------------------------------------------
 Gather  (cost=1000.00..38819.49 rows=10251 width=44)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..36794.39 rows=4272 width=44)
         Subplans Removed: 3
         ->  Parallel Seq Scan on t01_part_part2023  (cost=0.00..4975.73 rows=1 width=45)
               Filter: timestamp without time zone = now()
(6 行记录)

可以看到 “Subplans Removed: 3”,也就是访问的分区扣除了3个。

3.2 volatile 函数无法使用分区剪枝功能

EXPLAIN SELECT * FROM t01_part WHERE part_date=clock_timestamp();
//屏幕输出:
QUERY PLAN
-----------------------------------------------------------------------
 Gather  (cost=1000.00..35718.11 rows=4 width=45)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..34717.71 rows=4 width=45)
         ->  Parallel Seq Scan on t01_part_part2024  (cost=0.00..12226.83 rows=1 width=45)
               Filter: timestamp without time zone = clock_timestamp()
         ->  Parallel Seq Scan on t01_part_part2025  (cost=0.00..12153.22 rows=1 width=45)
               Filter: timestamp without time zone = clock_timestamp()
         ->  Parallel Seq Scan on t01_part_part2026  (cost=0.00..5361.91 rows=1 width=45)
               Filter: timestamp without time zone = clock_timestamp()
         ->  Parallel Seq Scan on t01_part_part2023  (cost=0.00..4975.73 rows=1 width=45)
               Filter: timestamp without time zone = clock_timestamp()
(11 行记录)

可以看到,SQL 访问了所有数据分区。